An OpenMP Runtime for Transparent Work Sharing across Cache-Incoherent Heterogeneous Nodes
نویسندگان
چکیده
In this work, we present libHetMP , an OpenMP runtime for automatically and transparently distributing parallel computation across heterogeneous nodes. targets platforms comprising CPUs with different instruction set architectures (ISA) coupled by a high-speed memory interconnect, where cross-ISA binary incompatibility non-coherent caches require application data be marshaled to shared CPUs. Because of this, work distribution decisions must take into account both relative compute performance asymmetric communication overheads. drives workload without programmer intervention measuring characteristics during cross-node execution. A novel HetProbe loop iteration scheduler decides if execution is beneficial either distributes according the when it or places all on homogeneous providing best not. We evaluate using kernels from several benchmark suites show geometric mean 41% speedup in time some workloads may showcase irregular behavior among iterations, extend second called HetProbe-I. The evaluation HetProbe-I shows can further improve computation, cases up 24%, triggering periodic decisions.
منابع مشابه
A Transparent Runtime Data Distribution Engine for OpenMP
This paper makes two important contributions. First, the paper investigates the performance implications of data placement in OpenMP programs running on modern NUMA multiprocessors. Data locality and minimization of the rate of remote memory accesses are critical for sustaining high performance on these systems. We show that due to the low remote-to-local memory access latency ratio of contempo...
متن کاملEfficient Cache Sharing Protocol for Mobile Nodes
Mobile Ad hoc Network provide an attractive solution for networking in the situations where network infrastructure or service subscription is not available. Its usage can further be extended by enabling communications with external networks such as Internet or cellular networks through gateways. However, data access applications in MANETs suffer from dynamic network connections and restricted e...
متن کاملAn Adaptive Runtime Library for OpenMP on Hyperthreaded SMPs
Hyperthreaded (HT) and simultaneous multithreaded (SMT) processors are now available in commodity workstations and servers. This technology is designed to increase throughput by executing multiple concurrent threads on a single physical processor. These multiple threads share the processor’s functional units and on-chip memory hierarchy in an attempt to make better use of idle resources. Most O...
متن کاملAn Efficient OpenMP Runtime System for Hierarchical Arch
Exploiting the full computational power of always deeper hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture. The emergence of multi-core chips and NUMA machines makes it important to minimize the number of remote memory accesses, to favor cache affinities, and to guarantee fast completion of synchronization...
متن کاملAn Efficient OpenMP Runtime System for Hierarchical Architectures
Exploiting the full computational power of always deeper hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture. The emergence of multi-core chips and NUMA machines makes it important to minimize the number of remote memory accesses, to favor cache affinities, and to guarantee fast completion of synchronization...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Computer Systems
سال: 2021
ISSN: ['1557-7333', '0734-2071']
DOI: https://doi.org/10.1145/3505224